-
Notifications
You must be signed in to change notification settings - Fork 691
Improve API of literal save and the snapshot (command line) tool #2507
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve API of literal save and the snapshot (command line) tool #2507
Conversation
docs/02.API-REFERENCE.md
Outdated
|
|
||
| Collect the used literals from the given source code and save them into a specific file in a list or C format. | ||
| These literals are generated by the parser, they are valid identifiers and none of them are magic string. | ||
| Collect the used literals from the given snapshot and save them into a buffer in a list or C format. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... in list or C format.
docs/02.API-REFERENCE.md
Outdated
| Collect the used literals from the given source code and save them into a specific file in a list or C format. | ||
| These literals are generated by the parser, they are valid identifiers and none of them are magic string. | ||
| Collect the used literals from the given snapshot and save them into a buffer in a list or C format. | ||
| None of these literals are magic string. In C format only valid identifiers are collected. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... are magic strings.
docs/02.API-REFERENCE.md
Outdated
| sizeof (save_literal_buffer) / sizeof (uint32_t), | ||
| true); | ||
| jerry_value_t generate_result; | ||
| generate_result = jerry_generate_snapshot (NULL, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it needed to split this into two statements? I've checked, jerry_value_t generate_result = jerry_generate_snapshot (... fits in the line length limit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not needed indeed, but I think it looks a bit better. I can change if you want. Both solutions are fine to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Variables should be initialized at the time of their declaration whenever possible. This may be called a personal preference. It's no blocker but wanted to make a note.
jerry-core/api/jerry-snapshot.c
Outdated
| bool is_c_format) /**< format-flag */ | ||
| jerry_get_literals_from_snapshot (const uint32_t *snapshot_p, /**< input snapshot buffer */ | ||
| size_t snapshot_size, /**< size of the input snapshot buffer */ | ||
| uint32_t *lit_buf_p, /**< [out] buffer to save literals to */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is lit_buf_p an uint32_t * if it stores a string? It will be cast to uint8_t * in the code anyway. (Question nr.2: why not to char * or jerry_char_t *?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question. It comes from the original code. I think it is because the snapshot command line tool, because we use an uint32_t output buffer for both snapshots and literals. char * would not be the best choice, because the output is not a zero terminated string and it can be misleading. IMHO jerry_char_t * would be the best choice. I'll update the PR.
jerry-main/main-unix-snapshot.c
Outdated
| sizeof (output_buffer) / sizeof (uint32_t), | ||
| is_save_literals_mode_in_c_format); | ||
| if (literal_buffer_size == 0) | ||
| uint32_t *lit_out_buf_p = output_buffer + snapshot_size; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
output_buffer is an uint32_t array, snapshot_size seems to be a expressed in bytes. Pointer arithmetics will cause the addition to be off by a factor of 4 than expected.
jerry-main/main-unix-snapshot.c
Outdated
| is_save_literals_mode_in_c_format); | ||
| if (literal_buffer_size == 0) | ||
| uint32_t *lit_out_buf_p = output_buffer + snapshot_size; | ||
| size_t lit_out_buf_sz = (sizeof (output_buffer) / sizeof (uint32_t)) - snapshot_size; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This expression also seems to be incorrect. sizeof (output_buffer) is already in bytes, isn't it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Extra note / question: What ensures that the output of jerry_get_literals_from_snapshot will fit in the buffer of size lit_out_buf_sz? And how could we detect if it didn't?
jerry-main/main-unix-snapshot.c
Outdated
| .help = "export literals found in parsed JS input (in list format)"), | ||
| CLI_OPT_DEF (.id = OPT_MERGE_LITERAL_C, .longopt = "save-literals-c-format", | ||
| .meta = "FILE", | ||
| .help = "export literals found in parsed JS input (in C source format)"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't it be better to have a separate subcommand for extracting literals from a snapshot instead of adding export options to both generate and merge subcommands? That would reduce code duplications also.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer it this way, but we can change it. Do you mean something like this:
jerry-snapshot literal --format=list -o literal.list FILE_1 ... FILE_N ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I was thinking something like this. It would have several benefits, IMO.
The subcommands of the snapshot tool would work as analogies to well known tools from the native domain, e.g., 'generate' is like a compiler, 'merge' resembles a linker, and this 'literal' (or however it is named) could be the counterpart of objdump. (Ofc, none of them are perfect matches, there are important differences everywhere.)
There is a reason why these build tools are separated. This way a user doesn't have to execute unnecessary functionalities to perform a given step. E.g., in this case, assumig that a snapshot is already present but the user forgot to ask for a literal dump at the time of its creation, they don't have to recreate the snapshot from source just to get that dump but it can be retrieved from the snapshot itself.
All this is also aligned with the linux command line tool philosophy: a tool (now, subcommand) should do one thing but do that right. (I.e., don't clone the same functionality into different tools but put it into a single one and leave the functionality composition via sequential execution or pipes or whatever to the user.)
75ebbed to
f1c1596
Compare
f1c1596 to
9809103
Compare
|
@akosthekiss thanks for your review. I've updated the PR. |
7e85b71 to
ecc0e2b
Compare
docs/02.API-REFERENCE.md
Outdated
| bool is_c_format); | ||
| jerry_get_literals_from_snapshot (const uint32_t *snapshot_p, | ||
| size_t snapshot_size, | ||
| uint8_t *lit_buf_p, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The .c file declares this as jerry_char_t.
jerry-core/api/jerry-snapshot.c
Outdated
| ecma_collection_header_t *lit_pool_p = ecma_new_values_collection (); | ||
| ecma_save_literals_add_compiled_code (bytecode_data_p, lit_pool_p); | ||
| JERRY_ASSERT ((header_p->lit_table_offset % sizeof (uint32_t)) == 0); | ||
| const uint8_t *literal_base_p = (uint8_t *) (snapshot_data_p + header_p->lit_table_offset); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
snapshot_data_p is already a uint8_t *. Is this explicit cast to the same type needed?
jerry-core/api/jerry-snapshot.c
Outdated
| ecma_free_value (JERRY_CONTEXT (error_value)); | ||
| return 0; | ||
| ecma_raise_type_error ("Invalid snapshot format"); | ||
| return ecma_create_error_reference_from_context (); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not familiar with this function here. But how can its return value be compatible with size_t? And even if it is a number, how can it have any error signalling power? If it is non-zero, it will be interpreted as a buffer size. If it is zero, well, it cannot encode anything useful then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, it looks like it is messed up after the rebase. I'll fix it.
jerry-core/api/jerry-snapshot.c
Outdated
| ecma_bytecode_deref (bytecode_data_p); | ||
| ecma_collection_header_t *lit_pool_p = ecma_new_values_collection (); | ||
| scan_snapshot_functions (snapshot_data_p + header_p->func_offsets[0], | ||
| snapshot_data_p + header_p->lit_table_offset, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this equivalent to literal_base_p?
jerry-main/main-unix-snapshot.c
Outdated
| { | ||
| is_c_format = true; | ||
| } | ||
| else if (strcmp ("list", fromat_str_p)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd suggest using else if (!strcmp ("list", format_str_p)) { is_c_format = false; } else { ... return FAIL; }. Although list is the default format, an intentionally weird command line might cause counterintuitive behaviour with the current approach: jerry-snapshot --format c --format list will not switch back to list format. (A cli option may appear multiple times and the last one counts, usually.)
jerry-main/main-unix-snapshot.c
Outdated
| }; | ||
|
|
||
| /** | ||
| * Process 'literal-dump' command. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
literal-dump as a subcommand name is a bit long to my taste but I let others comment on this. (dump, litdump, literals?)
jerry-main/main-unix-snapshot.c
Outdated
| if (literal_file_p == NULL) | ||
| if (number_of_files < 1) | ||
| { | ||
| jerry_port_log (JERRY_LOG_LEVEL_ERROR, "Error: at least one input files must be passed.\n"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
...one input file... ...specified.
| { | ||
| /* The input contains more than one input snapshot file, so we must merge them first. */ | ||
| const char *error_p = NULL; | ||
| size_t merged_snapshot_size = jerry_merge_snapshots (snapshot_buffers, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this necessary? Cannot it be specified that literal dump works on one snapshot only? Doing merge in this subcommand seems to go against the 'do one thing but do that right' idea.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMHO, yes it is necessary. This is how we usually going to use it. E.g.: in IoT.js. It doesn't save the merged snapshot and does not modify the input, so I think it does only one thing (gather the literals). I don't see any problem with this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 for the auto-merge. Snapshot tool is not intended to run in low-end, so adding convenience functions should not hurt.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
side note: this code could become simpler if jerry_merge_snapshots could deal with corner cases like number_of_files=1. I recall that there was some discussion about this back when the snapshot tool was first implemented. one more item on a todo list?
|
|
||
| if (literals_file_name_p == NULL) | ||
| { | ||
| literals_file_name_p = is_c_format ? "literals.h" : "literals.list"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cli option comment documents the c format case to write its output to literals.c not .h.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've fixed the option comment.
ecc0e2b to
2547ed9
Compare
|
@akosthekiss thanks for your review. I've updated the PR. |
| Collect the used literals from the given source code and save them into a specific file in a list or C format. | ||
| These literals are generated by the parser, they are valid identifiers and none of them are magic string. | ||
| Collect the used literals from the given snapshot and save them into a buffer in list or C format. | ||
| None of these literals are magic strings. In C format only valid identifiers are collected. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason for the C limitation? It would not be bad to control this with a flag.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Can be done later if backward incompatible API changes can be avoided)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason is that special unicode characters break the C format. It is deficiency of the current version (latest master) as well. I'd like to work on it in a followup if it is good for you.
| */ | ||
| static void | ||
| scan_snapshot_functions (const uint8_t *buffer_p, /**< snapshot buffer start */ | ||
| const uint8_t *buffer_end_p, /**< snapshot buffer end */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, I don't really like removing this, but since this is a static function we can reintroduce it later if needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me hm, too. I didn't want to suggest the removal of this argument, although I see how my comment has led here. I should dive a bit deeper into snapshot functions to check the semantics behind buffer_end_p and literal_base_p, where and when and how they are used, why they are the same at all/most call sites, how can the literal data start after the snapshot buffer, etc. If they are really always equal then that may hint at a conceptual problem (or that may be too harsh, something that could be optimised, or named better, etc).
So I +1 now to no removal and to switch back to the original redundant arguments. (But someone could put this on a todo list and take a closer look.)
| if (snapshot_size <= sizeof (jerry_snapshot_header_t)) | ||
| { | ||
| ecma_free_value (JERRY_CONTEXT (error_value)); | ||
| /* Invalid snapshot format */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would add a few sanity checks, for example the jerry magic string.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed now, I've added more checks.
| { | ||
| /* The input contains more than one input snapshot file, so we must merge them first. */ | ||
| const char *error_p = NULL; | ||
| size_t merged_snapshot_size = jerry_merge_snapshots (snapshot_buffers, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 for the auto-merge. Snapshot tool is not intended to run in low-end, so adding convenience functions should not hurt.
57d93e5 to
1cafb83
Compare
docs/02.API-REFERENCE.md
Outdated
| { | ||
| FILE *literal_file_p = fopen ("literals.txt", "w"); | ||
| fwrite (save_literal_buffer, sizeof (uint8_t), literal_sizes, literal_file_p); | ||
| FILE *literal_file_p = fopen ("literals.txt", "wb"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note: jerry_get_literals_from_snapshot is called with is_c_format=true, so the .txt extension is a bit counterintuitive in the example. (this is nothing functional, just an observation)
| */ | ||
| static void | ||
| scan_snapshot_functions (const uint8_t *buffer_p, /**< snapshot buffer start */ | ||
| const uint8_t *buffer_end_p, /**< snapshot buffer end */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me hm, too. I didn't want to suggest the removal of this argument, although I see how my comment has led here. I should dive a bit deeper into snapshot functions to check the semantics behind buffer_end_p and literal_base_p, where and when and how they are used, why they are the same at all/most call sites, how can the literal data start after the snapshot buffer, etc. If they are really always equal then that may hint at a conceptual problem (or that may be too harsh, something that could be optimised, or named better, etc).
So I +1 now to no removal and to switch back to the original redundant arguments. (But someone could put this on a todo list and take a closer look.)
| { | ||
| /* The input contains more than one input snapshot file, so we must merge them first. */ | ||
| const char *error_p = NULL; | ||
| size_t merged_snapshot_size = jerry_merge_snapshots (snapshot_buffers, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
side note: this code could become simpler if jerry_merge_snapshots could deal with corner cases like number_of_files=1. I recall that there was some discussion about this back when the snapshot tool was first implemented. one more item on a todo list?
| jerry_port_log (JERRY_LOG_LEVEL_ERROR, "Error: cannot open file: '%s'\n", literals_file_name_p); | ||
| jerry_cleanup (); | ||
| return JERRY_STANDALONE_EXIT_CODE_FAIL; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the if (...) { log and return } ... (without an else branch) style is used all over the code, and could also be used here: if (file_p == NULL) { bail out }
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
Removed 'jerry_parse_and_save_literals' and introduced 'jerry_get_literals_from_snapshot' instead which works on snapshot buffers rather than source code. Added literal saving feature to snapshot merge in the snapshot command line tool. Also added missing 'jerry_cleanup()' calls to the snapshot tool. Improved the console messages of the snapshot tool. Based on previous work of Tamas Zakor <ztamas@inf.u-szeged.hu> JerryScript-DCO-1.0-Signed-off-by: László Langó llango.u-szeged@partner.samsung.com
1cafb83 to
8d25a6f
Compare
|
@zherczeg @akosthekiss the PR is updated. |
akosthekiss
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
zherczeg
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Removed 'jerry_parse_and_save_literals' and introduced
'jerry_get_literals_from_snapshot' instead which works
on snapshot buffers rather than source code. Added literal
saving feature to snapshot merge in the snapshot command
line tool. Also added missing 'jerry_cleanup()' calls to the
snapshot tool. Improved the console messages of the snapshot
tool.
Based on previous work of Tamas Zakor ztamas@inf.u-szeged.hu
JerryScript-DCO-1.0-Signed-off-by: László Langó llango.u-szeged@partner.samsung.com